Abstract
Introduction: RNA profiling using microarray has been used extensively in cancer research, but its clinical utilization remains limited due to its poor reproducibility and difficulty in standardization.
Next generation sequencing (NGS) technologies have advanced significantly in accuracy, reproducibility and simplicity. Targeted enrichment of RNA and NGS sequencing provides a reliable and reproducible approach for RNA profiling that is amenable to implementation in clinical laboratories. We explored using this approach in evaluating the utility of targeted enrichment RNA NGS sequencing in the diagnosis of various hematologic neoplasms.
Methods: RNA from fresh bone marrow or peripheral blood samples from patients with ALL (N=29), AML (N=18), and CLL (N=25) and formalin fixed paraffin embedded (FFPE) samples from DLBCL (N=79), sarcoma (N=12) and various epithelial tumors (N=22). We used the TruSight RNA Pan-Cancer Panel (Illumina, San Diego, CA) for detecting the expression of 1385 genes. The results were standardized so the expression values had mean 0 and standard deviation 1. Fusions and mutations were not considered in this analysis.
Results: The generated expression profiles provided simple data on various disease markers including CD3, CD19, CD8 and PAX5 that can be used directly for distinguishing between diseases. However, using statistical analysis we were able to distinguish between various diseases with high accuracy. We first focused on classifying ALL vs AML vs CLL since fresh samples were used in generating the expression data on these samples. Accounting for multiple hypothesis testing and setting false discovery rate (FDR) at 0.05, we found 1074 genes as statistically significant. Using the top 5 principal components, we can build a multinomial logit model with 93.1% accuracy rate based on leave-one-out cross-validation. Using a tree model with only FANCI and CIITA genes levels, we were able to diagnose ALL, AML, and CLL with an 88.9% accuracy rate also based on leave-one-out cross-validation.
Similarly for the three FFPE groups (DLBCL, solid epithelial tumor, and sarcoma), setting FDR at 0.05 resulted in 1169 genes that were significantly different across the three groups. Using the top 5 principal components, we can build a multinomial logit model with a 92.0% accuracy rate on the leave-one-out testing. Using a simple tree model and the expression levels of CD79A and MALAT1 genes, we were able to distinguish between the three groups with a 94.7% accuracy rate on leave-one-out testing.
Mixing FFPE with fresh tissue samples and analyzing all 6 groups, we found 1361 genes significantly by setting FDR at 0.05 to account for multiple hypothesis testing. A multinomial logit model with the top 5 principal components provided 89.7% accuracy in classifying the 6 disease using the leave-one-out testing.
Conclusion: This data suggests that using targeted enrichment RNA sequencing and quantifying the expression of genes provide reliable data that can be used for distinguishing between various diseases. This is applicable to fresh and FFPE tissue. This approach also provides information on fusion genes and mutations. While these sources of information were not used in this model, they can provide additional important information not only to improve the model, but also to provide information on prognosis and potential response to therapy.
Ma: NeoGenomics: Employment. De Dios: NeoGenomics: Employment. Funari: NeoGenomics: Employment. Blocker: NeoGenomics: Employment. Albitar: NeoGenomics: Employment.
Author notes
Asterisk with author names denotes non-ASH members.